Integrated drug resistance and leukemic stemness gene-expression scores predict outcomes in large cohort of over 3500 AML patients from 10 trials

In this study, we leveraged machine-learning tools by evaluating expression of genes of pharmacological relevance to standard-AML chemotherapy (ara-C/daunorubicin/etoposide) in a discovery-cohort of pediatric AML patients (N = 163; NCT00136084) and defined a 5-gene-drug resistance score (ADE-RS5) that was predictive of outcome (high MRD1 positivity p = 0.013; lower EFS p < 0.0001 and OS p < 0.0001). ADE-RS5 was integrated with a previously defined leukemic-stemness signature (pLSC6) to classify patients into four groups. ADE-RS5, pLSC6 and integrated-score was evaluated for association with outcome in one of the largest assembly of ~3600 AML patients from 10 independent cohorts (1861 pediatric and 1773 adult AML). Patients with high ADE-RS5 had poor outcome in validation cohorts and the previously reported pLSC6 maintained strong significant association in all validation cohorts. For pLSC6/ADE-RS5-integrated-score analysis, using Group-1 (low-scores for ADE-RS5 and pLSC6) as reference, Group-4 (high-scores for ADE-RS5 and pLSC6) showed worst outcome (EFS: p < 0.0001 and OS: p < 0.0001). Groups-2/3 (one high and one low-score) showed intermediate outcome (p < 0.001). Integrated score groups remained an independent predictor of outcome in multivariable-analysis after adjusting for established prognostic factors (EFS: Group 2 vs. 1, HR = 4.68, p < 0.001, Group 3 vs. 1, HR = 3.22, p = 0.01, and Group 4 vs. 1, HR = 7.26, p < 0.001). These results highlight the significant prognostic value of transcriptomics-based scores capturing disease aggressiveness through pLSC6 and drug resistance via ADE-RS5. The pLSC6 stemness score is a significant predictor of outcome and associates with high-risk group features, the ADE-RS5 drug resistance score adds further value, reflecting the clinical utility of simultaneous testing of both for optimizing treatment strategies.

predictive of treatment outcomes in AML patients.To fulfill this goal, we cataloged a list of 67 genes involved in the metabolism or transport of ara-C, daunorubicin or etoposide and their potential drug targets.These genes can contribute to the emergence of drug resistance through various mechanisms as: (1) reduced cellular uptake due to low levels of uptake transporters; (2)  increased efflux due to high expression of efflux transporters; (3) decreased expression or activity of enzymes responsible for the activation of pro-drugs; (4) increased expression or activity of enzymes responsible for the drug inactivation; (5) alterations in the expression or function of the molecular targets of the drugs.These key players have been well-established in impacting drug pharmacokinetics or pharmacodynamics a comprehensive transcriptomic evaluation using machine learning tools to develop a drug resistance signature has not been done.However, comprehensive evaluation of transcriptomic of these players have not been performed in AML.Previously, Least Absolute Shrinkage and Selection Operator (known as LASSO) based regression analysis defined a leukemic stemness score consisting of gene expression levels of 17 genes that was predictive of outcome has been reported 6 .A follow-up work defined a pediatric leukemic stemness score consisting of 6 genes in AML 7 .Within ALL, lasso analysis has been utilized to define prognostic risk factors 8 .
In this study, we evaluated the transcriptome of 67 pharmacologically relevant genes (listed in Table S1) in pediatric AML patients treated on the AML02 multi-center clinical trial.We utilized LASSO penalized regression on clinical outcome data to examine the significance of these genes and developed an ADE-Resistance Score (ADE-RS5) that was further validated in 10 independent AML cohorts.Recently our group developed a six-gene leukemic stem cell (pLSC6) score that associated with risk-groups and patient outcomes in pediatric AML 7 .Further combining the pLSC6 and ADE-RS5 score groups to incorporate both disease aggressiveness, as implied by the stemness score, and drug resistance, as reflected by resistance score was conducted across 10 cohorts of pediatric and adult AML patients, totaling 3634 individuals.

Results
Expression of five pharmacological genes defines a drug resistance score of prognostic value in AML02 discovery cohort LASSO penalized Cox regression model using mRNA expression levels of 67 genes with EFS in 163 patients (model-development cohort) treated on multi-site AML02 trial identified five genes that passed at least 950 of 1000 leave-10%-out cross-validation replications of this analysis (Fig. 1 and Supplementary Fig. 1).This rigorous model-development process defined a five-gene ADE-Resistance Score (ADE-RS5) which was computed for each patient using gene expression weighted by the regression coefficients as defined in the following equation: Each unit increase in ADE-RS5 was associated with a 7.32-fold increase in the rate of EFS events (p < 0.00001, 95% CI = 3.75-14.28)in a simple single-predictor Cox regression model.Dichotomization by recursive-partitioning resulted in classification of patients into two groups: low ADE-RS5 (n = 98 patients, 60%) or high ADE-RS5 score group (n = 65 patients; 40%).Though ADE-RS5 score groups did not differ by age, gender, race, risk group, FLT3-ITD status or WBC count at

Integration of ADE-RS5 score with previously established pLSC6 score in AML02 discovery cohort
We previously developed a clinically significant leukemic stemness score in pediatric AML and designated it as pLSC6 (derived from expression levels of DNMT3B, GPR56, CD34, SOCS2, SPINK2, and FAM30A).Patients within low pLSC6 score group previously showed better outcome as compared high pLSC6 group 7 .ADE-RS5 was tested within the pLSC6 score groups.
With respect to survival outcomes patients within Group 4 had lower EFS (HR = 8.89, p < 0.0001) and OS (HR = 12.68, p < 0.0001) as compared to patients in Group 1 (Fig. 2D).Patients within Groups 2 and 3 showed intermediate outcome with significantly poor outcome as compared to the Group 1 (all p < 0.005, Fig. 2D).
Validation of transcriptomic based prognostic scores in >3000 patients from independent pediatric and adult clinical trials We performed analysis of pLSC6, ADE-RS5 and integrated scores by combining all the pediatric cohorts together (4 different trials, total n = 1861) and all the adult cohorts together (5 different trials, total n = 1669).Distribution of patient characteristics by pLSC6, ADE-RS5 and integrated-pLSC6/ADE-RS5 scores across pediatric and adult validation cohorts is provided in Table 1.Overall consistent with our previous report, pLSC6 score group was significantly associated with patient's risk group assignment, cytogenetics and FLT3 status and in addition to these factors, ADE-RS5 was associated with age in the combined pediatric and gender in the combined adult cohort.
Age stratified analysis for adults less than 65 years old and elderly patients who are ≥65 years old showed pLSC6 (pLSC6 low vs. high, <65 yrs, HR = 2.06, P < 0.00001; ≥65 yrs, HR = 2.02, P < 0.00001, Supplementary Fig. 5A, C), and ADE-RS5 (low ADE-RS5 vs. high, <65 y, HR = 1.37,P < 0.00001, and ≥65 yrs, HR = 1.21, p = 0.093, Supplementary Fig. 5E, F) to be associated with OS.The integrated scores remained a significant predictor of OS in the two age groups (Supplementary Fig. 5I,  K).In the multivariable analysis adjusting for risk group assignment and FLT3-ITD mutation, pLSC6 and the integrated scores remained as significant independent predictor of OS in both age groups (Supplementary Fig. 5).Given cytogenetically normal (CN) subgroup of AML patients constitute significant proportion of patients and experience highly heterogenous response, we evaluated ADE-RS5, pLCS6 and integrated scores within these subgroups in all the 9 cohorts as well as in an additional cohort of CN patients from GSE71014 dataset.Consistent with the results from the whole cohort within CN-AML with high-pLSC6/high ADE-RS scores experienced significantly lower EFS and OS compared to low-pLSC6/low ADE-RS score group in pediatric and adult cohorts (Supplementary Fig. 6A, B).Multivariable analysis adjusting for age, WBC count at diagnosis and FLT3-status, pLSC6, ADE-RS5, and integrated score groups remained significant independent predictors of outcomes in pediatric and adult CN patients (Supplementary Fig. 6A, B).
Additionally, hematopoietic stem cell transplant (HSCT) can have a significant impact on outcome and we previously showed that patients with high pLSC6 score do not show benefit from HSCT in AML02 cohort 7 .Though HSCT information was not available in all cohorts we evaluated HSCT as a time-dependent variable for pLSC6, ADE-RS5 and the integrated  (20.6)   59 (23)   93 (22.6)  score in 4 cohorts with availability of data.As shown in Supplementary Fig. 7, the score groups remained significant predictor of EFS and OS.
In addition to the analysis performed in the combined cohorts for pediatric and adult AML, each cohort was evaluated individually.Figure 5 shows a summary of results for association of both pLSC6 and ADE-RS5 scores in individual cohorts for EFS (5 pediatric cohorts) and 3 adult AML cohorts, (N = 3330) and OS (5 pediatric and 5 adult AML cohorts, total N = 3693).Consistent with the results from the discovery cohort pLSC6 was significantly associated with EFS (Fig. 5A) and OS (Fig. 5B) in all individual cohorts tested with common effect of HR = 1.95, 95%CI = 1.78-2.14,p < 0.00001 for association with EFS, and HR = 2.06, 95%CI = 1.88-2.26,P < 0.00001 for association with OS.ADE-RS5 was significantly associated with EFS in all cohorts (p < 0.01) except for AML08 (p = 0.07) and the Leucegene (p = 0.55) cohort, and with OS in all cohorts (p < 0.01) except for AML08 (p = 0.12), Beat AML (p = 0.8) and the Leucegene (p = 0.68) cohort, with common effect of HR = 1.34, 95%CI = 1.23-1.46,p < 0.00001 for association with EFS, and HR = 1.45, 95%CI = 1.32-1.59,p < 0.00001 for association with OS (Fig. 5C, D). Figure 5E-J shows the results for integrated LSC6-ADE-RS5 score (Groups 2-4 vs. Group 1) again showing Group 4 with worst outcome as compared to Group 1.

Discussion
Cytarabine, daunorubicin and etoposide (ADE) are commonly used for induction of remission and intensification of pediatric AML.A combination of cytarabine and anthracyclines is the mainstay of treatment in adults.However, development of chemotherapeutic resistance is a major cause of AML treatment failure 3,5 .In recent years, significant effort has been devoted on transcriptomics based prognostic factors including leukemic stemness score (LSC17) reported in 2016 6 in an adult AML.Our group had previously leveraged the leukemia stemness genes identified by Ng et al. 6 and using outcome data from pediatric AML developed a pediatric leukemic stemness score that composed of 6 genes 7 .In addition to leukemic stemness that defines disease aggressiveness, development of drug resistance is an inherent clinical challenge.In this study, we used similar strategy to define a chemotherapeutic resistance score focused on key genes of pharmacological relevance (pharmacokinetics/ pharmacodynamics) to ADE.After running LASSO regression key genes of pharmacological relevance to ADE, we defined an ADE-RS score that was computed for each patient based on the expression level of five genes multiplied by their regression coefficients.These five genes included (i) deoxycytidylate deaminase (DCTD), a deaminase involved ara-CMP to ara-UMP conversion; (ii) ATP Binding Cassette Subfamily C Member 1 (ABCC1), an efflux transporter implicated in daunorubicin and etoposide efflux; (iii) Myeloperoxidase (MPO), involved in etoposide-catechol to quinone conversion 9 , MPO is also a myeloid cell specific marker 10 ; (iv) Topoisomerase II alpha (TOP2A), daunorubicin and etoposide target 11 ; and (v) Carbonyl Reductase 1 (CBR1), involved in reduction of daunorubicin to daunorubicinol 12,13 .Drug metabolism is a very complicated process with involvement of influx, efflux transporters, activating and inactivating enzymes and the dynamic interaction between these making it very challenging to simultaneously study all of these.Thus, alternative approaches as are done here provide some insight into drug responsiveness governed by pharmacological genes.To the best of our knowledge this is one of the first studies to apply this approach to establish drug resistance score that holds prognostic value and is predictive of survival outcomes.
Further a previously established pLSC6 and newly developed ADE-RS5 score were evaluated as a prognostic factor in 9 independent pediatric and adult AML cohorts totaling more than 3000 patients.pLSC6 score was validated in each cohort and within cytogenetically normal group as well as within patients less than and more than 65 yrs old.This is in contrast to recent observation where LSC17 was not associated with EFS and OS in patients ≥60 yrs age 14 .
Furthermore, the ADE-RS5 score predicted outcome within low and high pLSC6 groups indicating it offers additional prognostic value beyond that captured by the pLSC6 score alone.Thus, a four-group classifier system Race was not available in the adult cohorts; Discovery cohort AML02 is represented in Supplementary Table 3 and not included in the Pediatric combined dataset; GSE71014 dataset of CN patients from Taiwan was not added to the combined adult datasets analysis.
EFS event free survival, OS overall Survival, CR complete remission.
(Group 1 to Group 4) was developed for patients.Integrated stemness and drug-resistance score groups predicted outcome in both pediatric and adult AML patients as well as within different cytogenetic subgroups as well as within CN-AML.Group 1 representing patients with low-LSC6 and low-ADE-RS5 group had the most favorable outcome and group 4 with both high scores had the poorest outcomes.In addition, both pLSC6, ADE-RS5, and the integrated score groups, were significant and independent predictors of poor outcomes after adjusting for risk group assignment, age, FLT3-ITD mutation and WBC count at diagnosis.ADE-RS5 is not validated in BEAT AML and Leucegene cohorts and we believe this may be due to older age of the patient's, different frequency of cytogenetic risk categories, treatment regimens without etoposide, and potential effect of transplant.Gene expression levels of all genes that are part of LSC17 was not available in all cohorts due to the type of array used, however we evaluated LSC17 groups as previously described and stayed significant predictor of OS.Combination of ADE-RS5 and LSC17 showed added value of ADE-RS5 in predicting survival (Supplementary Fig. 8).Despite this being one of the few studies with large patients' samples across multiple cohorts there are some limitations such as non-uniform treatment protocols across the cohorts, continued updates on the AML classification resulting in changes in the initial risk group classification in older trials, variability in the post-induction treatment protocols across trials and centers, lack of availability of EFS data and time to transplant in some adult cohorts, lack of mechanistic studies supporting functional relevance of some of the genes that are part of the score.
In conclusion, this report highlights the significant prognostic value of multi-gene transcriptomics-based scores, that includes the assessment of disease aggressiveness through pLSC6 score and drug resistance via ADE-RS5 score.Our analysis reveals that the pLSC6 stemness score is a significant predictor of outcome and associates with high-risk group features, the ADE-RS5 drug resistance score adds further value, reflecting the clinical utility of simultaneous testing of both to optimize treatment strategies.One notable aspect of this study is the evaluation of nine entirely independent clinical cohorts, including both pediatric and adult AML patients from various countries.Evaluation of only 6 genes highlights the simplicity of clinical utility of pLSC6.Future clinical translation of these results, can be accelerated by use of a simple method for quantification of 11 genes such as that based on RT-PCR or use of nano string based assay, we have previously shown consistency for pLSC6 score across three platforms U133A, RNAseq and RT-PCR 7 .Future work is focused on developing a web-based tool that will allow for other investigators to utilize our signatures to predict treatment outcomes and refining patient classification.(EFS) and overall survival (OS) have been described 15 .Gene expression profiling of leukemic blasts obtained at diagnosis in the AML02 discovery cohort was performed using GeneChip® Human Genome U133A [Affymetrix, Santa Clara, CA] as described previously 16 .The MAS 5.0 algorithm was used to obtain normalized gene expression signals.Expression data for 67 genes of relevance to ADE pharmacology (listed in Supplementary Table S1) was extracted and log2 transformed before the analysis.Validation cohorts AML patient cohorts with both gene expression data from diagnostic specimen and clinical outcome data available were included in the validation studies.Patients diagnosed with myelodysplastic syndrome (MDS), myelodysplastic syndrome refractory anemia with excess blasts (MDS-RAEB), Down's syndrome-related AML and acute promyelocytic leukemia (APL; FAB-M3), data from specimens not from diagnosis or those missing survival data were excluded from the study.The validation cohorts are summarized below and listed in Fig. 1 (additional details are provided in the Supplementary Material).All the cohorts were evaluated for association between transcriptomic scores and clinical outcome endpoints individually as well as in a combined into pediatric and adult AML datasets.Use of data and/or specimens were approved by the respective protocol or institutional Institutional Review Boards, and informed consent was obtained from parents/guardians or patients and assents from the patients, as appropriate, in accordance with the approved clinical trial protocols and in accordance with Helsinki declaration.Study was approved by University of Florida Institutional Review Board.
Pediatric AML -children's oncology group (COG) AAML1031.This dataset included 941 pediatric AML patients treated under the COG-AAML1031 (NCT01371981).RNAseq and clinical outcome data provided by COG or obtained from TARGET-AML project (https://ocg.cancer.gov/programs/target/projects/acute-myeloid-leukemia).Details on the clinical trial and outcome have been previously published 20 .
Pediatric AML-AML08 cohort.This dataset included 122 pediatric AML patients treated under the multi-center AML08 clinical trial (NCT00703820) and were included in this evaluation 21 .RNA samples from diagnosis were available from 122 patients and gene expression data on 11 genes of interest was generated using Taqman based assay as detailed in Supplementary Material.Details on the clinical trial and outcome have been previously published 21 .
Pediatric AML-GSE17855 cohort.For this cohort, data from 197 pediatric AML patients (following exclusion criteria listed above) were included in this study.Patients received treatment on 8 different trials.Expression data generated using U133 plus array was downloaded from Gene Expression Omnibus (GSE) database (GSE17855).
Adult AML-GSE68833-the cancer genome atlas (TCGA) cohort.This dataset included 165 adult AML patients with publicly available clinical and gene expression data.U133-Plus microarray gene expression data was downloaded for this group of patients from Gene Expression Omnibus database (GSE68833).RNA-Seq gene expression data for 153 patients was also available for this cohort.
Adult AML-GSE37642.This dataset included 374 adult AML patients treated in the German AMLCG-1999 trial 22 with publicly available geneexpression data generated using U133A array 23 .
Adult AML-GSE6891.This dataset included 417 adult AML patients treated according to sequential Dutch-Belgian Hemato-Oncology Cooperative Group and the Swiss Group for Clinical Cancer Research multiple HOVON trails with publicly available gene expression data generated using U133 plus array.
Adult AML-BeatAML.Clinical data was downloaded from http://www.vizome.org/aml/and merged with clinical data downloaded from Cbioportal-OHSU 24 .After applying exclusion criteria indicated above, 198 patients were included in the current study.
Adult AML-Leucegene AML cohort.This dataset included 515 adult patients with newly diagnosed AML who were treated with intensive induction chemotherapy (7 + 3 based regimens) in Quebec (Canada) between 2001 and 2019.Diagnostic bone marrow or peripheral blood samples were collected and stored by the Quebec leukemia cell bank (bclq.org).Gene expression data was generated with whole transcriptome sequencing using an Illumina HiSeq 2000 sequencing system as part of the Leucegene project (leucegene.ca) and clinical data was collected and validated by the Quebec leukemia cell bank (details in supplementary material).
All the gene expression data was log2 transformed before analysis.RNA-Seq data was normalized as Reads per kilo base of transcript per million mapped reads (RPKM) or transcripts per million (TPM).We used log2 (RPKM + 1) or log2 (TPM + 1) values for subsequent statistical analysis.Supplementary Table 2 provides a list of probe/assay IDs for the 11 genes that constitute pLSC6 and ADE-RS5 score.

Clinical Outcome endpoint definitions
Minimal residual disease after induction I course (MRD1) of treatment was defined as one or more leukemic per 1000 mononuclear cells (≥0.1%).Event-free survival (EFS) was defined in the AML02 discovery cohort as the time from study enrollment to induction failure, relapse, second malignancy, refusal of therapy, removal from therapy because of unacceptable toxicity, or death, with patients who had not experienced any of these events censored at last follow-up.The definition of EFS among other clinical trials is described in the respective clinical trial outcome reports cited above or in supplemental information.Overall survival (OS) was defined as the time from study enrollment to death, with living patients censored at last follow-up.

Development of ADE-RS score
We utilized a least absolute shrinkage and selection operator (LASSO) Cox regression model, as implemented in glmnet package of the R3.6.0 statistical software (www.r-project.org), to the gene expression levels (67 genes of pharmacological relevance to ADE) and the EFS data of patients from the AML02 discovery cohort.To evaluate the variability and reproducibility of the LASSO Cox regression model estimates, we repeated the LASSO Cox regression fitting process for each of 1,000 leave-10%-out cross-validation evaluations.Genes with non-zero coefficient estimates in at least 950 of these 1000 evaluations were retained.The final model coefficient was obtained by averaging the coefficient estimates obtained for the set of cross-validation evaluations.We further utilized a recursive partitioning survival model, as implemented in the rpart package, to dichotomize ADE-resistance scores into "low" and "high" score groups (60% as low and 40% as high).
Integrated pLSC6/ADE-RS5 score groups pLSC6 score was generated based on the expression level of six genes: DNMT3B, GPR56, CD34, SPINK2, SOCS2, FAM30A multiplied by their regression coefficients as defined previously 7 .Patients were classified as low or high pLSC6 groups as defined previously.Based on combination of the pLSC6 and ADE-RS5 score group designation, patients were further grouped as described in the results section.Association between pLSC6, ADE-RS5, and integrated score groups with clinical outcome endpoints was analyzed on the individual cohort level of pediatric AML datasets that included COG-cohort 1 (N = 601), COG-cohort 2 (N = 941), AML08 pLSC6/ADE-RS5 and EFS in AML02 2F) ADE-RS5 and OS in AML02 2H) Integrated pLSC6/ADE-RS5 and OS in AML02

Table 1 |
Distribution of patient characteristics in combined pediatric and adult AML validation cohorts (N

Table 1 (
continued) | Distribution of patient characteristics in combined pediatric and adult AML validation cohorts (N